强大的增强学习(RL)考虑了在一组可能的环境参数值中最坏情况下表现良好的学习政策的问题。在现实世界环境中,选择可靠RL的可能值集可能是一项艰巨的任务。当指定该集合太狭窄时,代理将容易受到不称职的合理参数值的影响。如果规定过于广泛,则代理商将太谨慎。在本文中,我们提出了可行的对抗性鲁棒RL(FARR),这是一种自动确定环境参数值集的方法。 Farr隐式将可行的参数值定义为代理可以在足够的培训资源的情况下获得基准奖励的参数值。通过将该问题作为两人零和游戏的配方,Farr共同学习了对参数值的对抗分布,并具有可行的支持,并且在此可行参数集中进行了强大的策略。使用PSRO算法在这款FARR游戏中找到近似的NASH平衡,我们表明,接受FARR训练的代理人对可行的对抗性参数选择比现有的minimax,domain randanmization,域名和遗憾的目标更强大控制环境。
translated by 谷歌翻译
在竞争激烈的两种环境中,基于\ emph {double oracle(do)}算法的深度强化学习(RL)方法,例如\ emph {policy space响应oracles(psro)}和\ emph {任何时间psro(apsro)},迭代地将RL最佳响应策略添加到人群中。最终,这些人口策略的最佳混合物将近似于NASH平衡。但是,这些方法可能需要在收敛之前添加所有确定性策略。在这项工作中,我们介绍了\ emph {selfplay psro(sp-psro)},这种方法可在每次迭代中的种群中添加大致最佳的随机策略。SP-PSRO并不仅对对手的最少可剥削人口混合物添加确定性的最佳反应,而是学习了大致最佳的随机政策,并将其添加到人群中。结果,SPSRO从经验上倾向于比APSRO快得多,而且在许多游戏中,仅在几次迭代中收敛。
translated by 谷歌翻译
Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.
translated by 谷歌翻译
Large language models trained for code generation can be applied to speaking virtual worlds into existence (creating virtual worlds). In this work we show that prompt-based methods can both accelerate in-VR level editing, as well as can become part of gameplay rather than just part of game development. As an example, we present Codex VR Pong which shows non-deterministic game mechanics using generative processes to not only create static content but also non-trivial interactions between 3D objects. This demonstration naturally leads to an integral discussion on how one would evaluate and benchmark experiences created by generative models - as there are no qualitative or quantitative metrics that apply in these scenarios. We conclude by discussing impending challenges of AI-assisted co-creation in VR.
translated by 谷歌翻译